CAM based system and method for re-sequencing data packets

ABSTRACT

An embodiment of the system operates in a parallel packet switch architecture having at least one egress adapter arranged to receive data packets issued from a plurality of ingress adapters and switched through a plurality of independent switching planes. Each received data packet belongs to one sequence of data packets among a plurality of sequences where the data packets are numbered with a packet sequence number (PSN) assigned according to at least a priority level of the data packet. Each data packet received by the at least one egress adapter has a source identifier to identify the ingress adapter from which it is issued. The system for restoring the sequences of the received data packets operates within the egress adapter and comprises buffer for temporarily storing each received data packet at an allocated packet buffer location, a controller, and a determination means coupled to a storing means and extracting means.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation application of U.S. patentapplication Ser. No. 10/723,834 entitled “CAM Based System and Methodfor Re-Sequencing Data Packets”, filed Nov. 26, 2003 now U.S. Pat. No.7,400,629, the disclosure of which is incorporated herein in itsentirety for all purposes.

FIELD

The present invention relates to high speed switching in general andmore particularly to a system and method to restore the sequence of datapackets switched through independent planes of a Parallel Packet Switch.

BACKGROUND

DWDM, which stands for Dense Wavelength Division Multiplexing, bymerging onto a single optical fiber many wavelengths, is makingavailable long-haul fiber-optic data communications links of hugeaggregate capacity. Each wavelength is an independent communicationschannel which typically operates at OC48c i.e.: 2.5 Giga or 109 bits perSecond (Gbps), OC192c (10 Gbps) and in some systems at OC768c (40 Gbps).These rates are part of a family of rates and formats available for usein optical interfaces, generally referred to as SONET, which is astandard defined by the American National Standards Institute (ANSI) ofwhich there exists an European counterpart, mostly compatible, known asSDH (Synchronous Digital Hierarchy). Thus, at each node of a network,the data packets or cells carried on each DWDM channel must be switched,or routed, by packet-switches that process and then switch packetsbetween different channels so as to forward them towards their finaldestination. Ideally, it would be desirable to keep the processing ofpackets in the optical domain, without conversion to electronic form;this is still not really feasible today mainly because allpacket-switches need buffering that is not yet available in an opticalform. So packet-switches will continue to use electronic switchingtechnology and buffer memories for some time to come.

However, because of the data rates as quoted above for individual DWDMchannels (up to 40 Gbps) and the possibility of merging tenths, if nothundredths, of such channels onto a single fiber the throughput tohandle at each network node can become enormous i.e., in a multi Tera or1012 bits per second range (Tbps) making buffering and switching, in theelectronic domain, an extremely challenging task. If constantsignificant progress has been sustained, for decades, in the integrationof always more logic gates and memory bits on a single ASIC (ApplicationSpecific Integrated Circuit), allowing to implement the complexfunctions required to handle the data packets flowing into a nodeaccording to QoS (Quality of Service) rules unfortunately, the progressin speed and performance of the logic devices over time is comparativelyslow, and now gated by the power one can afford to dissipate in a moduleto achieve it. Especially, the time to perform a random access into anaffordable memory e.g., an imbedded RAM (Random Access Memory) in astandard CMOS (Complementary MOS) ASIC, is decreasing only slowly withtime while switch ports need to interface channels having their speedquadrupling at each new generation i.e., from OC48c to OC192c and toOC768c respectively from 2.5 to 10 and 40 Gbps. For example, if a memoryis 512-bit wide allowing to store or fetch, in a single write or readoperation, a typical fixed-size 64-byte (8-bit byte) packet of the kindhandled by a switch, this must be achieved in less than 10 Nano or 10-9second (Ns) for a 40 Gbps channel and in practice in a few Ns only inorder to take care of the necessary speed overhead needed to sustain thespecified nominal channel performance while at least one store and onefetch i.e., two operations, are always necessary per packet movement.This represents, nowadays, the upper limit at which memories and CMOStechnology can be cycled making the design of multi Tbps-class switchextremely difficult with a cost-performance state-of-the-art technologysuch as CMOS, since it can only be operated at a speed comparable to thedata rate of the channel they have to process.

Hence, to design and implement a high capacity packet-switch (i.e.:having a multi Tbps aggregate throughput) from/to OC768c (40 Gps) portsa practical architecture, often considered to overcome the abovementioned technology limitation, is a Parallel Packet Switch (PPS)architecture. It is comprised of multiple identical lower-speedpacket-switches e.g., (100) operating independently and in parallel, assketched in FIG. 1. In each ingress port adapter, such as (110), anincoming flow of packets (120) is spread (130), packet-by-packet, by aload balancer across the slower packet-switches, then recombined by amultiplexor (140) in the egress part of each port adapter e.g., (150).As seen by an arriving packet, a PPS is a single-stage packet-switchthat needs to have only a fraction of the performance necessary tosustain the port data rate. If four planes (100, 102, 104 and 106) areused, as shown in FIG. 1, they need only to have one fourth of theperformance that would otherwise be required to handle a full port datarate. More specifically, four independent switches, designed with OC192cports, can be associated to offer OC768c port speed, provided thatingress and egress port adapters (110, 150) are able to load balance andrecombine the packets. This approach is well known from the art andsometimes referred to as ‘Inverse Multiplexing’ or ‘load balancing’.Among many publications on the subject one may e.g., refer to a paperpublished in Proc. ICC'92, 311.1.1-311.1.5, 1992, by T. ARAMAKI et al.,untitled ‘Parallel “ATOM” Switch Architecture for High-Speed ATMNetworks’ which discusses the kind of architecture considered here.

The above scheme is also attractive because of its inherent capabilityto support redundancy. By placing more planes than what is strictlynecessary it is possible to hot replace a defective plane without havingto stop traffic. When a plane is detected as being or becoming defectiveingress adapter load balancers can be instructed to skip the defectiveplane. When all the traffic from the defective plane has been drainedout it can be removed and replaced by a new one and load balancers setback to their previous mode of operation.

Thus, if PPS is really attractive to support multi-Gbps channel speedsand more particularly OC768c switch ports it remains that this approachintroduces the problem of packet re-sequencing in the egress adapter.Packets from an input port (110) may possibly arrive out of sequence ina target egress adapter (150) because the various switching paths, herecomprised of four planes (100), do not have the same transfer delaysince they run independently thus, can have different buffering delays.A discussion and proposed solutions to this problem can be found, forexample, in a paper by Y. C. JUNG et al., ‘Analysis of out-of-sequenceproblem and preventive schemes in parallel switch architecture forhigh-speed ATM network’, published in IEEE Proc. -Commun., Vol. 141, No.1, February 1994. However, this paper does not consider the practicalcase where the switching planes have also to handle packets on apriority basis so as to support a Class of Service (CoS) anode ofoperation, a mandatory feature in all recent switches which are assumedto be capable of handling simultaneously all sorts of traffic at nodesof a single ubiquitous network handling carrier-class voice traffic aswell as video distribution or just straight data file transfer. Hence,packets are processed differently by the switching planes depending onthe priority tags they carry. This does no longer comply with the simpleFCFS (First-Come-First-Served) rule assumed by the above referencedpaper and forces egress adapters to readout packets as soon as they areready to be delivered by the switching planes after which they can beresequenced on a per priority basis. Also, the above paper implicitlyassumes the use of a true time stamp (TS) which means in practice thatall port-adapters are synchronized so as packets from different sourcesare stamped from a common time reference which is a difficult andexpensive requirement to meet.

Another difficulty with a PPS architecture stems from the fact thatnetworks must not only support UC (unicast) traffic (one source to onedestination) but also MC. (multicast) traffic that is, traffic in whicha source may have to send a same flow of packets to more than onedestination. Video distribution and network management traffic are ofthis latter case (e.g., the IP suite of protocols assumes that somecontrol packets must be broadcast). For example, with a 64-port switchthere are only 64 UC flows (times the number of priorities) for eachsource since there are only 64 possible destinations. However, there mayhave anything from none to tenths of thousands of MC flows to besupported in such a switch, each one being identified by a unique MCid(MC identifier) thus, specifying to what particular combination of morethan one destination a packet of a MC flow must be forwarded from a samesource. Therefore, to overcome the problem introduced by the transferdelays different in the independent planes a simple numbering of UCpackets at source i.e., in each ingress adapter, can be envisaged toallow re-sequencing in the egress adapters. This, however, does fit withMC traffic because of the multiplicity of possible combinations ofdestinations from a same source. For example, MC packets numbered with asimple complete ascending sequence (n, n+1, n+2, etc.), sent from a samesource and received in different combinations of egress adapters, asspecified by their MCid, will generally create incomplete sequences ofpacket numbers since destinations are obviously not all the same fromone MCid to another one.

Finally, in the context of a PPS switch, the traditional way of handlingpackets readout in the egress adapters does no longer fits either. In atraditional single plane switch no disordering in the delivery of theswitched packets is introduced by the switching unit (other than the‘disordering’ introduced by the handling of packets on the basis oftheir priorities). This allows forming LL's (linked lists) of packets,per priority, implicitly remembering their order of arrival thus, theorder in which they must be forwarded within a priority class. Appendinga new element to a LL i.e., always to LL tail, is a relatively easy taskeven though this must be done at the very high speeds previouslymentioned. However, inserting a packet in the right place of a linkedlist is much more complicated. This requires to first determine wherepacket must be inserted, since packets are not guaranteed to be receivedin the right order then, update the links to a next and from a previouselement.

Forming LL's has been the subject of numerous publications. For adiscussion on this subject, so as to evaluate the difficultiesencountered to carry out in hardware, at the speed required by aTerabit-class switch, the insertion of a new element in a LL, one mayrefer, e.g., to a book by Robert_Sedgewick, ‘Algorithms’, secondedition, Addison-Wesley, 1988, ISBN 0-201-06673-4 and more specificallyto chapter 3 ‘Elementary Data Structures’.

Thus, in view of the difficulties of prior art arrangements as mentionedhere above, there is a need for a resequencing solution in order to makefeasible a PPS architecture in which variable delays can be experiencedin the individual switching planes while supporting priority classes ofunicast and multicast traffic in view of the implementation of amulti-Tbps switch.

The present invention offers such solution.

SUMMARY

Embodiments of the invention may provide a system and method to restoresequences of data packets in the egress adapters of a parallel packetswitch architecture.

Embodiments of the invention may support resequencing of unicast as wellas multicast traffic with a unique mechanism having a common set ofresources.

Embodiments of the invention may provide ingress adapters that neitherneed to be synchronized nor require to use a true time stamp to mark thepackets.

In an embodiment, the system operates in a parallel packet switcharchitecture having at least one egress adapter arranged to receive datapackets issued from a plurality of ingress adapters and switched througha plurality of independent switching planes. Each received data packetbelongs to one sequence of data packets among a plurality of sequenceswhere the data packets are numbered with a packet sequence number (PSN)assigned according to at least a priority level of the data packet. Eachdata packet received by the at least one egress adapter is furtherhaving a source identifier to identify the ingress adapter it is issuedfrom. The system for restoring the sequences of the received datapackets operates within the egress adapter and comprises means fortemporarily storing each received data packet at an allocated packetbuffer location. Furthermore, extracting means allow to extract thepacket sequence number, the source identifier and the priority level ofeach stored data packet. And determination means coupled to the storingmeans and to the extracting means allow determination for each sequenceof data packet the order of the data packets to be output from theegress adapter.

In some embodiments, each of a plurality of ingress adapters comprisesmeans (210) for numbering the unicast data packets according to thepriority level and to the at least one egress adapter of each unicastdata packet.

The resequencing system operates for each received data packet accordingto the following resequencing method: at each received data packet, apacket buffer location is allocated to the received data packet which istemporarily stored at the allocated packet buffer location. Asource-priority register is identified by the source identifier and thepriority level of the stored data packet. The source-priority registercontains a packet sequence number (PSN) and a packet buffer locationidentifier (ID) of a previously received data packet. Thesource-priority register is also associated to a valid-bit latch thatindicates an active/not active status. In order to determine if thereceived data packet is to be output as the next data packet of arespective sequence of data packets, the status of the valid-bit latchis checked and the packet sequence number of the received data packet iscompared with the packet sequence number contained within the pointedsource-priority register.

Further objects, features and advantages of the present invention willbecome apparent to the ones skilled in the art upon examination of thefollowing description in reference to the accompanying drawings. It isintended that any additional advantages be incorporated herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a conceptual view of a parallel packet switch system toimplement the invention;

FIG. 2 is a block diagram showing the main components of a preferredembodiment of the invention;

FIG. 3 is a block diagram of the main components of the egress bufferingof FIG. 2;

FIG. 4 details the resequencing CAM based mechanism implemented in theegress adapter;

FIG. 5 is a flow chart of the incoming packet process in the egressadapter;

FIG. 6 is a flow chart of the outgoing packet process in the egressadapter; and

FIG. 7 is a schematic view to illustrate the wrapping of the sourcecounters.

DETAILED DESCRIPTION OF EMBODIMENTS

FIG. 2 shows a functional view of a PPS architecture according to theinvention. For sake of clarity, only one ingress adapter (200) is showninterfacing a plurality of switching planes (planes A to X under block250) over which an incoming traffic (290) of data packets is loadbalanced by a load balancer circuit (205). The skilled man will easilyunderstand through the reading of the entire description that allfunctional principles described for one ingress adapter may begeneralized to a plurality of ingress adapters.

To allow the re-sequencing of data packets in the egress adapters (260),prior to or while load-balancing, all unicast packets are numbered perpriority and per destination (2100 to 2163) in the ingress adapter. Itis to be noted that the numbering performed for a unicast packet fromone source towards one destination, is completely unrelated with thenumbering performed by the same source towards other destinations, andis also unrelated with the numbering performed by the other sources.This is possible because as will be further detailed, each destinationsorts the packets it receives per priority and per source, these onesbelonging to independent flows.

In the preferred described PPS implementation, only the unicast trafficis load balanced while multicast packets are sent by each source totheir multiple destinations always through at least one pre-assignedswitching plane (thus, multicast traffic flows are pre-assigned tospecific switching planes e.g., on the basis of groups of destinationports). Hence, there is not the requirement of having to number MCpackets at source since the invention preferably assumes that MC flowsare always switched through a same switching plane which does notintroduce any disordering. Contrary to UC packets, MC packets are thusnumbered at destination (275), in each egress adapter (260), so as toavoid the problem discussed in the background section on the numberingin sources of traffic with multiple destinations while allowing toimplement a single mechanism that works both for UC and MC traffic. Thisis further discussed hereafter and described in following figures.

In practice the numbering of MC packets at destinations can be carriedout in different equivalent ways. MC packets can indeed be numbered onthe basis of their actual source i.e., the ingress adapter MC VOQ (230)from which they are issued. However, because MC traffic flows areassigned to specific planes there is no real need to do so and analternate solution, that might be preferred, is to consider thatswitching planes are actually, in each egress adapter, the sources of MCflows instead. Because there are generally, in a PPS structure, fewerplanes than switch ports, this requires fewer resources in egressadapters. Obviously, whichever solution is adopted, the numbering mustbe performed per priority too. The rest of the description of theinvention broadly refers to the source of MC packets as being,equivalently, either the MC VOQ in the ingress adapters or the switchingplanes. Thus, it is to be understood that a plurality of independentcounters per source allocate a sequential packet number to each incomingdata packet according to the priority level.

Then, packets received through the various planes (250) are temporarilystored in an Egress Buffer (265). As mentioned earlier, reading out thepackets from the switch planes must be done without delay since planesare assumed to process packets on a per priority basis too and, in nocase, a packet of lower priority should stay in the way of a higherpriority packet since this would create a priority HoL (head of line)blocking. As already stated above, the invention assumes that thecounters used to rank unicast packets (2100 to 2163) are not required tobe in synchronism in the various ingress adapters. Also, multicastpackets are numbered (275) per plane (and per priority) when they arrivein Egress Adapter. As a consequence, packets from different sourcescannot (and need not to) be compared to restore their sequence. In otherwords, the invention assumes that packet resequencing is not onlyperformed independently on a per priority basis but as well on the basisof their source (270). Hence, packets are read out as soon as they areready to leave the switch planes in order to perform resequencing ineach unicast where they need to be temporarily stored (265).

In a preferred implementation, the above mode of operation i.e.,resequencing per priority and per source, further assumes that eachegress adapter is equipped with an output scheduler (280), role of whichis to select, at each packet cycle, the next packet, temporarily storedin the Egress Buffer (265), due to leave the egress adapter. Egresspacket scheduling is a mechanism which is beyond the scope of theinvention and is not further discussed other than to mention that itsrole is normally to serve the waiting packets of highest prioritiesfirst while, for each priority, maintaining fairness between the sourcesof traffic that are independently resequenced.

There is also a similar packet scheduling function (220) in each ingressport-adapter which selects the waiting incoming packets to be switched.Generally, waiting packets are organized under the form of VOQ's(Virtual Output Queues) (230), a scheme well-known from the art whichprevents priority and port destination HoL blocking in the ingressadapters so that a waiting incoming packet can neither be blocked by alower priority packet nor by a packet destined for a busy switchoutput-port. These are standard functions in switch port-adapters.Packet scheduling (220, 280) and VOQ's (230) are not part of theinvention which does not require any specific behavior from theseelements to operate as specified in the rest of the description.

FIG. 2 and following figures illustrate the invention assuming that theswitch is a 64-port switch, so VOQ's have 64 unicast (UC) destinations(0-63) per priority plus the multicast (MC) destination. For this lattercase there is, per flow, only one packet sent to one of the switchingplanes as defined to the load balancing function for this source (205).Switching plane must replicate it to the multiple destinations concernedby the multicast flow to which packet belongs. It must be noticed that,in contrast with unicast flows where only one single counting resourceis required per VOQ, in case of multicast flows, no counting resource isrequired in Ingress Adapter (200). However, as described above, therequired numbering function is performed in Egress Adapter (275) whichinserts, in MC packets, a PSN (packet sequence number) e.g., under theform of a complete ascending sequence n, n+1, n+2, etc. on a per sourceand per priority basis to stay compatible with UC numbering.

It is to be appreciated that generally switch port-adapters have astandard line or NP (network processor) IN and OUT interface (290) e.g.,such as the ones defined by the Network Processing Forum (NPF), 39355California Street, Suite 307, Fremont, Calif. 94538.

FIG. 3 shows how is organized the Egress Buffering function (265) ofFIG. 2 in an embodiment of the invention. Each incoming packet (360)switched through any of the PPS planes, is temporarily stored in anegress buffer (365) in an unconditional manner. The egress buffer istypically made of a RAM (Random Access Memory) either internal to anASIC (Application Specific Integrated Circuit) used to implement theEgress port-adapter functions or is using commercially availablediscrete RAM modules controlled by the ASIC. The invention assumes thereis enough buffering provided to allow resequencing of all packet flowsbeing handled in egress adapter at any moment. The upper value toconsider is highly dependent on the operation of the switching planesused to implement the PPS structure. Especially, it depends on thenumber of priorities they are handling and of how much traffic they haveto sustain under a worst case scenario of traffic corresponding to theapplication for which the switching function is devised. A typicalparameter influencing the size of the buffer is the burstiness of thetraffic i.e., the probability of having a series of N consecutivepackets, at a same level of priority, all destined for the same port.This may be highly disturbing for the rest of the traffic creatingcontention and resulting in the holding of lower priority packets insome middle switching planes thus, preventing some flows from beingresequenced while packets are already waiting in the egress buffertaking up space. Preferably, the buffer size is dimensioned to allowresequencing under worst case conditions. In practice this is achievedby having a flow control implemented between the various components ofthe switching function i.e., the ingress and egress adapters and theindividual switch planes. To help reach this objective a Waiting Packetcounter (WPC) and a timer may be implemented as optional features, as itis described later.

Then, associated to the egress buffer (365) there a list of free buffersor FBL (Free Buffer List) (370). With each incoming packet (360) a freebuffer location is withdrawn (375) from FBL so that packet canimmediately be stored within the corresponding packet buffer. This isdone irrespective of the priority, the rank and the plane through whichthe data packet arrived in the egress adapter.

FIG. 4 shows the hardware resources required by the present invention toimplement the resequencing of packets (360) received through thedifferent switching planes. There are required a Content AddressableMemory (CAM) (410) and a set of registers and latches (440). CAM (410)contains as many entries (435) as there are available packet buffers inthe Egress Buffer (365). Thus, there is a one to one correspondencebetween one packet buffer of the Egress Buffer (365) and one CAM entry(435). Each CAM entry (435) consists in two fields: one Search Field(420) and one ID field (430). The ID field contains a packet bufferidentification used to identify each packet buffer location in theEgress Buffer (365). In a preferred embodiment of the invention it issimply the unique buffer address as selected by FBL (370) of FIG. 3 andused as an ID of packet while stored in egress buffer. The Search Field(420) is build up using three sub-fields: a source identification field(422), a priority level field (424) and a Packet Sequence Number (PSN)field (426). As already discussed, PSN is allocated by the unicastPacket Numbering logic (2100 to 2163) for unicast traffic, or by theMulticast Packet Numbering logic (275) for multicast traffic.

As it will be further detailed with reference to FIG. 5, the searchfield is updated at the time a new packet is stored into the EgressBuffer (365) if the algorithm described in FIG. 5 determines that a newentry should be made in the CAM (410).

The set of registers and latches (440) is coupled to the the CAM andcontains as many registers (450) and latches (455) as there are sourcesand priority levels. As an example, in an implementation featuring 64possible sources and 8 priorities with a load balancing of unicasttraffic over 6 switching planes, there are required 64×8=512 registers(450) and latches (455) for unicast traffic. In addition, consideringthe switch planes as the sources of the MC traffic, 6×8=48 moreregisters (450) and latches (455) are required for this type of traffic.It is to be noted that for sake of clarity the registers are denotedsource-priority registers in the continuing description. The term‘source’ is to be interpreted either as the ingress adapter for unicasttraffic or a as the switching plane for multicast traffic. Similarly,for sake of clarity the latches are denoted valid-bit latches.

Each valid-bit latch (455) allows setting a valid bit V to indicate tothe Packet Scheduler (280) that at least one packet is available forscheduling. This available packet is the one stored in the packet bufferidentified by the ID field contains in the corresponding source-priorityregister (450). The Packet Sequence Number stored in this samesource-priority register (450) indicates the current packet number ofthis packet.

Then, valid bits contain in valid-bit latches (455) are used as inputsto the Packet Scheduler logic (280). Once an available packet has beenprocessed by Packet Scheduler logic (280) and presented over the NPFInterface (290), the corresponding valid bit V is either kept activatedor is deactivated as it will be further explained with reference to FIG.6.

Each source-priority register is made of two mandatory fields to containa Packet Sequence Number (PSN) and a buffer address ID, and optionalfields to contain a Waiting Packet Count (WPC) and a Timer. Thesource-priority register is detailed later on.

FIG. 5 describes the process of any packet received in the egressadapter through a PPS plane.

The process begins with a packet read out (500) from one of theswitching plane (PLn) arrived from a given source (Sn) at a givenpriority (PTYn). On step 502, the packet is unconditionally stored inthe egress buffer at the address obtained from the free buffer list.

Prior to or while storing the packet, its source identifier, itspriority and its Packet Sequence Number (PSNi) (as set by the source iningress adapter for unicast packet, or at input in egress adapter formulticast packet) are extracted (step 504). Source Sn and Priority PTYnare used as an index to retrieve on step 506 the correspondingsource-priority register content (450)—which contains previously storedfield Packet Sequence Number (PSNC)—with the associated valid-bit latch(455). An optional step (508) allows to increment the value of a‘Waiting Packet’ counter (WPC). As it will be further explained withreference to FIG. 6, the Waiting Packet counter (WPC) may be consideredas a possible solution to monitor the number of packets arriving fromone source and waiting for being sent out on the Egress NPF Interface(290). If a packet is missing in a sequence, WPC increases because thefollowing packets may continue to be received but without beingforwarded to the egress NPF Interface (290). Hence, they must stay inthe egress buffer taking up space until the missing packet is received.

Retrieved valid bit V is next checked on step 510. If V is found inactive (branch 512), then on step 516 Packet Sequence Number (PSNi)carried in the incoming packet and extracted in step 504 is comparedwith the current Packet Sequence Number (PSNc) retrieved in the registerduring step 506. If (branch 518) the Packet Sequence Number (PSNI) ofthe received packet is exactly the next in sequence value, after thecurrent Packet Sequence Number (PSNc) stored in Source-priority register(450), i.e. PSNi=PSNc+1, this means that this packet is the expectedone, i.e. is exactly the one following the last one which has beenpreviously scheduled (and transmitted) for the corresponding source andpriority. In which case, it is necessary to indicate this new packet asready for scheduling by the Egress Packet Scheduling logic (280). Thisis performed on step 526 by updating in Source-priority register (450)the current PSN with the new value (PSNi) and the ID with the new bufferaddress and by setting active the corresponding valid bit V.

Optionally on step 528, there is the capability to reset the timer valueretrieved at step 506 which ends the process of receiving a packet(530).

Going back to step 510, if the valid bit V is found active (branch 514),this means that there is already at least one packet waiting for beingscheduled by the Egress Scheduling logic (280). Hence, CAM is updated(524) by writing at the egress buffer address or ID, address obtained onstep 502, the three fields (422,424,426) Source (Sn), Priority (PTYn)and Packet Sequence Number (PSNi) of the incoming packet. Performingthis Write operation makes that this new CAM entry be later retrieved bya Search operation which may be triggered in a future time as will beexplained later. Then, no further processing is required for this packet(end of process 530).

Going back to step 516, if the Packet Sequence Number (PSNi) of thereceived packet is not (branch 520) the next in sequence value (is notPSNc+1) after the Packet Sequence Number (PSNc) stored inSource-priority register (450), then this packet is not the onefollowing the last one which has been previously scheduled (andtransmitted), and it cannot be scheduled to depart from the egressadapter (there is at least still one missing packet to be received). Inthat case, the process follows with step 524 as described above (i.e., aCAM entry must be performed for that packet so as to later retrieve it).

As an optional feature of the present invention, on step 522 the timeralready mentioned above must be started or kept running if it wasalready triggered. As with the WPC this timer may optionally be used tomonitor the re-sequencing of missing packets.

FIG. 6 describes the process performed when a packet leaves the egressadapter. The selection of a packet due to leave the adapter is done onthe basis of the valid bits posted to the egress packet scheduler (280)to let it know which, ones of the source-priority registers haveactually a packet, waiting in the egress buffer, that may be forwarded.All valid bits are permanently made available to the scheduler so asthis latter has a full view of the waiting packets thus, has all theinformation it needs to make a decision at each outgoing packet cycle.

As already said, the algorithm on which the scheduler chooses a nextpacket to go is beyond the scope of the invention which does not assumeany particular method of selection. In general, the waiting packets ofthe highest priority have precedence, however at a same level ofpriority, fairness must be exercised between all sources of traffic(including MC traffic which has its own sources i.e., either the ingressMC VOQ's or the switching planes as already discussed) and exceptionsmay have to be considered to the strict priority rule if, e.g., onewants to guarantee a minimum bandwidth to lower priority traffic. All ofthis is highly dependent on the architectural choices that are made tofulfill the requirements of a particular application.

The process begins on step 600 with the selection by the scheduler ofone active valid bit. Corresponding register content is retrieved, i.e.Packet Sequence number and ID location of corresponding valid bit. Then,packet located at ID in egress buffer is immediately forwarded to egressadapter interface (290) and buffer is released to FBL.

Next, optionally the WPC counter of the selected source-priorityregister is decremented by one (step 602); as there is one packet lesswaiting for transmission in the egress buffer for this source and thispriority.

On step 604, a Search operation is initiated in the CAM, with the SearchField (420) set with the source and the priority of the just gonepacket. The last part of the Search Field is set with the PacketSequence Number of the selected packet (PSNc) incremented by one, thusperforming a search of the next-in-sequence packet. If the Searchoperation is successful (branch 606), it means that a packet coming fromthat source, having this priority and with a Packet Sequence Numberexactly following the one of the packet which has been just scheduled,is already waiting in the Egress buffer. As a result of the Searchoperation, the buffer address at which this packet has been storedbecomes available by performing standard operations of CAMs well knownof those skilled in the art. As a reminder, the CAM is written with thefields Source, Priority and Packet Sequence Number at an addressidentical to the one of the egress buffer which was used to store thepacket when it arrived from the plane (step 524).

On step 608, the current selected source-priority register indexed bythe source and priority is updated with an incremented by I PacketSequence Number. Moreover, the buffer address field is updated with thenew address retrieved from the Search operation, and the valid bit isconfirmed to the set value. It is to be noted that in order to guaranteethat in a future Search operation having same search arguments, the justobtained address does not show up again, this CAM entry is invalidated(step 610). Then the process ends (step 618).

If the Search operation is not successful (branch 612), then it meansthat no packet coming from that source, with this priority and having aPacket Sequence Number exactly following the one of the packet which hasbeen just scheduled, is waiting in the Egress buffer. Then correspondingvalid bit is reset (step 614) to inhibit any further selection by theScheduler (280).

As an optional feature of the present invention, there is on step 616the capability to start or to keep running the timer value retrieved atfirst step (600). Purpose of this timer is to provide a mean to monitorthe time elapsed since the last packet coming from one source for apriority has been scheduled and no in sequence packet from same sourceand same priority has been received. How these timers are processed, andwhich actions are triggered based on usage of these timers is not partof the present invention. WPC and timer are here mentioned to show howthe invention can be straightfully accommodated to provide the necessaryfeatures to handle error or exception cases such as the loss of packets,or the trapping of lower priority packets in the independent switchingplanes of a PPS structure. This would result in the accumulation ofpackets in the egress buffer because too many incomplete sequences ofpackets, that cannot be forwarded over the egress NPF interface (290),are building up possibly to a point where egress adapter would beblocked. Those skilled in the art will recognize how the informationprovided by WPC's and timers can be used to prevent this from happening.

Finally, there is no further processing (ending step 618). It is worthnoting that Waiting Packet Count (WPC) provides a mean for monitoringthe number of packets having been sent by one source for one priorityand waiting in Egress buffer, either because Packet Scheduling logic(280) does not schedule any packet for this source and this prioritywhen the corresponding V bit (455) is active, one reason possibly beingthat higher priority packets from same or other sources are to bescheduled, or because Packet Scheduling logic (280) is not able toschedule any packet for this source and this priority becausecorresponding V bit (455) is inactive, meaning that the next to schedulepacket (for this source and priority) has not yet been received in theEgress buffer, leading to unsuccessful Search in CAM operations.Although one can easily imagine that letting Waiting Packet Counts (WPC)increasing without any control, may lead to Egress buffer saturation andblocking of the system. It is not a purpose of this invention to providedirections for using it.

The above described solution is to be compared to a prior art systemhaving 70×8 linked lists wherein the head of each list being representedby the 70×8 source-priority registers, each one associated to its validbit. However, on the contrary of linked lists, the ‘linking’ with thenext packet virtually belonging to the same list is performed only whena source-priority register has been updated after a successful search inthe CAM has occured. As long as there is no successful search, thecorresponding ‘linked list’—identified by the source and relatedpriority (together with the current Packet Sequence Number)—is empty.The proposed mechanism has the clear advantage over linked listsolutions of being able to store packets independently of the order inwhich they arrive in the Egress buffer, while this is a much morecomplex task to perform using linked list where insertion of bufferpointers for new incoming packets among already linked buffers is not aneasy task and requires complex pointers operations.

FIG. 7 briefly discusses the problem of the wrapping (700) of thecounters used to rank packets at ingress or at egress. Those countershave a finite length thus, whichever their counting capacity the problemof their wrapping must be solved. The invention assumes that thosecounters have one bit more (710) than what is necessary to number thepackets. For a given application the counting capacity (720) must bedetermined so that the oldest numbered packet still waiting in theegress buffer (730) cannot be wrongly compared with a new arrivingpacket (of the same source with the same priority) because the counterused in the source has wrapped in the mean time. Once this value hasbeen determined the invention assumes that the counters are all made onebit wider so that numbering of waiting packets cannot span on more thanone counter wrapping boundary (750). Then, it is easy to take care ofthe counter wrapping. One solution consists in detecting the firstoccurrence of a packet number for which MSB (most significant bit) isfound to be 0 (760) after a series of ones. In which case the egressresources must immediately start to use PSN fields in toggling the valueof the MSB bit.

Finally, it must be clear to those skilled in the art that theresequencing according to the invention as described here above in FIGS.2 to 7 does not require any dedicated resources to implement atransparent switch over for unicast traffic in case of failure of aswitching plane. Indeed, ingress adapters (load balancing function) maybe instructed to skip a plane any time in view of its replacement or,for any reason, while all egress adapters keep resequencingtransparently since the scheme according to the invention neitherrequire that all planes be active nor make an assumption on the waytraffic is load balance by the ingress adapters thus, meeting theobjective of having a free transparent switch-over mechanism for unicasttraffic as a result of the use of the invention.

Embodiments may comprise a computer program product. The computerprogram product may comprise a computer readable medium containingcomputer readable code including a first instruction module foraccessing a buffer in which received data packets are temporarily storedand to extract the source identifier and the priority level of thestored data packet to select a corresponding source-priority registerthat contains a packet sequence number (PSN) and a packet bufferlocation identifier (ID) of a previously received data packet, thesource-priority register being associated to a valid-bit latch thatindicates an active/not active status. The computer readable code maycomprise a second instruction module to check the status of thevalid-bit latch and comparing the packet sequence number of the receiveddata packet with the packet sequence number contained within the pointedsource-priority register to determine if the received data packet is tobe output as the next data packet of the corresponding sequence of datapackets.

While the invention has been particularly shown and described withreferences to an embodiment, it will be understood by those skilled inthe art that various changes in both form and detail may be made thereinwithout departing from the scope and spirit of the invention.

Having thus described our invention, what we claim is as follows.

1. A system for resequencing received data packets comprising: a firstmeans for storing each received data packet at an allocated locationwithin said first means; a Content Address Memory having at least oneentry comprising an identification field to contain a packet bufferidentifier field to identify an allocated location in said first meansand a search field to contain a source identifier of a source providinga received data packet, a priority level for said received data packetand a sequence number for said received data packet, wherein recitedfields are concatenated; a plurality of source-priority registers eachcontaining, a packet sequence number and a packet buffer identifier of adata packet previously transmitted from said first means; a plurality ofvalid-bit latches respectively associated to the plurality of sourcepriority registers to set an active status to indicate thatcorresponding stored data packet is the next one in sequence; and amechanism for comparing the packet sequence number of said received datapacket with the packet sequence number contained within a pointedsource-priority register to determine if said received data packet is tobe output as the next data packet of the corresponding sequence of datapackets; storing the source identifier, the priority level, and thepacket sequence number with the packet buffer identifier for saidreceived data packet in said Content Accessible Memory in response todetermining that the packet sequence number of said received data packetis not to be output as the next data packet of the correspondingsequence of packets; searching said Content Addressable Memory for thepacket sequence number after scheduling output of the next data packetof the corresponding sequence of packets; and updating saidsource-priority register with the packet sequence number of saidreceived packet and the packet buffer identifier for said received datapacket from the Content Accessible Memory.
 2. The system of claim 1,wherein the first means for storing comprises a free buffer list toallocate a free packet buffer location to each received data packet. 3.The system of claims 1, wherein the received data packets compriseunicast and multicast data packets.
 4. The system of claim 3, whereineach of a plurality of ingress adapters comprises means for numberingthe unicast data packets according to the priority level and to at leastone egress adapter of each unicast data packet.
 5. The system of claim4, wherein each of the plurality of ingress adapters further comprisesmeans for load balancing over a plurality of independent switchingplanes the numbered data packets.
 6. The system of claim 5, wherein eachof the plurality of ingress adapters further comprises means forscheduling the switching of the unicast and multicast data packets overthe plurality of independent switching planes.
 7. The system of claim 6,further comprising the at least one egress adapter further comprisesmeans for numbering multicast data packets according to the prioritylevel of each multicast data packet and to an independent switchingplane each multicast data packet has been switched through.
 8. A methodfor resequencing received data packets comprising for each received datapacket: allocating a packet buffer location to the received data packetand temporarily storing said received data packet at said allocatedpacket buffer location; using a source identifier and a priority levelof the stored data packet to point to a corresponding source-priorityregister that contains a packet sequence number and a packet bufferlocation identifier of a previously received data packet, thesource-priority register being associated to a valid-bit latch thatindicates an active or not active status; checking the status of thevalid-bit latch; comparing the packet sequence number of the receiveddata packet with the packet sequence number contained within the pointedsource-priority register to determine if the received data packet is tobe output as the next data packet of the corresponding sequence of datapackets; storing the source identifier, the priority level, and thepacket sequence number with the packet buffer identifier for thereceived data packet in a Content Accessible Memory in response todetermining that the packet sequence number of the received data packetis not to be output as the next data packet of the correspondingsequence of packets; searching the Content Addressable Memory for thepacket sequence number after scheduling output of the next data packetof the corresponding sequence of packets; and updating thesource-priority register with the packet sequence number of the receivedpacket and the packet buffer identifier for the received data packetfrom the Content Accessible Memory.
 9. The method of claim 8, whereinthe checking further comprises: if the status is not active: updatingthe pointed source-priority register with the packet sequence number andthe packet buffer location identifier of the received data packet, onlyif the packet sequence number of the received data packet is the next insequence; and setting the status of the valid-bit latch to active;otherwise, if the status is active: said storing comprises writing inthe Content Addressable Memory, the source identifier, the prioritylevel and the packet sequence number of the received data packet, thewrite address being identified by the packet buffer location allocatedto the received data packet.
 10. The method of claim 8, furthercomprising incrementing a ‘waiting packet’ counter.
 11. The method ofclaims 8, further comprising scheduling the output of the received datapacket from at least one egress adapter.
 12. The method of claim 11,further comprising decrementing a ‘waiting packet’ counter aftertransmitting the received data packet.
 13. The method of claim 12,further comprising: keeping the status of the valid-bit latch to active;and invalidating the searched CAM entry; otherwise, if the searchingdoes not match, resetting the status of the valid-bit latch of thepointed source-priority register.
 14. The method of claim 13, furthercomprising after the resetting, starting a timer.
 15. The method ofclaims 9, further comprising scheduling the output of the received datapacket from at least one egress adapter.
 16. The method of claim 15,further comprising decrementing the ‘waiting packet’ counter aftertransmitting the received data packet.
 17. The method of claim 16,further comprising: keeping the status of the valid-bit latch to active;and invalidating the searched CAM entry; otherwise, if the search doesnot match, resetting the status of the valid-bit latch of the pointedsource-priority register.
 18. The method of claim 17, further comprisingafter the resetting starting a timer.
 19. A system comprising: a bufferin which received data packets are temporarily stored; a controllerprogrammed to use a source identifier and a priority level of the storeddata packet to select a corresponding source-priority register thatcontains a packet sequence number and a packet buffer locationidentifier of a previously received data packet, the source-priorityregister being associated with a valid-bit latch that indicates anactive or not active status; and said controller programmed to check thestatus of the valid-bit latch; to compare the packet sequence number ofthe received data packet with the packet sequence number containedwithin the pointed source-priority register to determine if the receiveddata packet is to be output as the next data packet in the correspondingsequence of data packets; to store the source identifier, the prioritylevel, and the packet sequence number with the packet buffer identifierfor said received data packet in said Content Accessible Memory inresponse to determining that the packet sequence number of said receiveddata packet is not to be output as the next data packet of thecorresponding sequence of packets; to search said Content AddressableMemory for the packet sequence number after scheduling output of thenext data packet of the corresponding sequence of packets; and to updatesaid source-priority register with the packet sequence number of saidreceived packet and the packet buffer identifier for said received datapacket from said Content Accessible Memory.
 20. A computer programproduct comprising: a computer readable medium containing computerreadable code, the computer program product further comprising a firstinstruction module for accessing a buffer in which received data packetsare temporarily stored and to use a source identifier and a prioritylevel of the stored data packet to select a correspondingsource-priority register that contains a packet sequence number and apacket buffer location identifier of a previously received data packet,the source-priority register being associated to a valid-bit latch thatindicates an active or not active status; and a second instructionmodule to check the status of the valid-bit latch; to compare the packetsequence number of the received data packet with the packet sequencenumber contained within the pointed source-priority register todetermine if the received data packet is to be output as the next datapacket of the corresponding sequence of data packets; to store thesource identifier, the priority level, and the packet sequence numberwith the packet buffer identifier for said received data packet in saidContent Accessible Memory in response to determining that the packetsequence number of said received data packet is not to be output as thenext data packet of the corresponding sequence of packets; to searchsaid Content Addressable Memory for the packet sequence number afterscheduling output of the next data packet of the corresponding sequenceof packets; and to update said source-priority register with the packetsequence number of said received packet and the packet buffer identifierfor said received data packet from said Content Accessible Memory.